-
Notifications
You must be signed in to change notification settings - Fork 102
Optimize unique tags #761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Optimize unique tags #761
Conversation
We have two general code paths: 1. Merge two Tags in to one 2. Merge two Tags in to one, excluding certain tags We used to handle number 1 by passing an empty map to number 2, however these can be seen as fast (add static tags) and slow (add static tags + perform filtering) code-paths. The fast code path is now optimized by performing an array search instead of using a map to track things that have been seen. This is more efficient for small values. There is diminishing returns eventually, but that's with a high number of tags, which is an anti-pattern. The primary benefit is in removing the map allocation. The slow code path is unchanged, but could be improved by ensuring the map is sufficiently large, or using the same linear search method to check if a tag should be excluded. Note: this benchmark isn't perfect because we clone the source arrays, but that can be identified in the profile. BenchmarkUniqueTagsPractical/original-22 40025511 300.1 ns/op 304 B/op 3 allocs/op BenchmarkUniqueTagsPractical/prealloc-22 38560572 308.3 ns/op 304 B/op 3 allocs/op BenchmarkUniqueTagsPractical/array-search-22 56932317 210.9 ns/op 304 B/op 3 allocs/op
| b.Run(name, func(b *testing.B) { | ||
| b.ReportAllocs() | ||
| for b.Loop() { | ||
| _ = f(slices.Clone(t1), slices.Clone(t2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we don't need to clone t2, but I had the code written...
hstan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I discovered a bug when trying to run a unit test to understand how the uniqueTagsSimple works:
with input:
{
name: "multiple duplicates",
t1: gostatsd.Tags{"a", "b", "a", "c", "b"},
t2: gostatsd.Tags{"a", "b", "c"},
expected: gostatsd.Tags{"a", "b", "c"},
},
the output of uniqueTagsSimple(t1, t2) is gostatsd.Tags{"a", "b", "b", "c"} instead of gostatsd.Tags{"a", "b", "c"}
|
|
||
| last := len(t1) | ||
| for idx := 1; idx < last; { // start at 1 because we know the first item will be unique. | ||
| if slices.Contains(t1[:idx-1], t1[idx]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should here be t1[:idx] to reach the last element? since the loop condition is idx < len(t1)?
(see my other comment for a failed test case)
We have two general code paths:
We used to handle number 1 by passing an empty map to number 2, however these can be seen as fast (add static tags) and slow (add static tags + perform filtering) code-paths.
The fast code path is now optimized by performing an array search instead of using a map to track things that have been seen. This is more efficient for small values. There is diminishing returns eventually, but that's with a high number of tags, which is an anti-pattern.
The primary benefit is in removing the map allocation.
The slow code path is unchanged, but could be improved by ensuring the map is sufficiently large, or using the same linear search method to check if a tag should be excluded.
Note: this benchmark isn't perfect because we clone the source arrays, but that can be identified in the profile.